Extraction of Cross Language Term Correspondences
نویسنده
چکیده
This paper describes a method for extracting translations of terms across languages, using parallel corpora. The extracted term correspondences are such that they are useful when performing query expansion for cross language information retrieval, or for bilingual lexicon extraction. The method makes use of the mutual information measure and allows for mapping between single wordto multi-word terms and vice versa. The method is scalable (accommodates addition or removal of data) and produces high quality results, while keeping the computational costs low enough for allowing on-the-fly translations in e.g., cross language information retrieval systems. The work was carried out in collaboration with Intrafind Software AG (Munich, Germany).
منابع مشابه
Effect of Cross-Language IR in Bilingual Lexicon Acquisition from Comparable Corpora
Within the framework of translation knowledge acquisition from WWW news sites, this paper studies issues on the effect of cross-language retrieval of relevant texts in bilingual lexicon acquisition from comparable corpora. We experimentally show that it is quite effective to reduce the candidate bilingual term pairs against which bilingual term correspondences are estimated, in terms of both co...
متن کاملReport on CLEF-2003 Experiments: Two Ways of Extracting Multilingual Resources from Corpora
We present in this report two main approaches to cross-language information retrieval based on the exploitation of multilingual corpora to derive cross-lingual term-term correspondences. These two approaches are evaluated in the framework of the multilingual-4 (ML4) task.
متن کاملSocial Network Extraction and Exploration of Historic Correspondences
Driven by the continuously increasing number of digitized and transcribed historic documents, natural language processing (NLP) and text analysis tasks are now frequently applied to historic texts to extract useful information and thus to enrich this cultural heritage. These tasks face several challenges, such as dealing with spelling variations, lack of orthography, and, oftentimes, missing re...
متن کاملTransfer Learning for Cross-Language Text Categorization through Active Correspondences Construction
Most existing heterogeneous transfer learning (HTL) methods for cross-language text classification rely on sufficient cross-domain instance correspondences to learn a mapping across heterogeneous feature spaces, and assume that such correspondences are given in advance. However, in practice, correspondences between domains are usually unknown. In this case, extensively manual efforts are requir...
متن کاملComputational Lexicography and Lexicology Elexbi, a Basic Tool for Bilingual Term Extraction from Spanish-Basque Parallel Corpora
We present the work done by Elhuyar Foundation in the field of bilingual terminology extraction. The aim ofthis work is to develop some techniques for the automatic extraction ofpairs ofequivalent terms from Spanish-Basque translation memories, and to implement those techniques in a prototype. Our approach is based on a monolingual extraction of term candidates in each language, then the creati...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006